Author: Lucas Lherbier.
Last update: June 8th, 2020.
This notebook is part of an ongoing personal project: implement a Python server, as an application, allowing to classify pictures between famous athletes. To do it, I create a deep learning model to classify pictures.
The classifier is developed with the Keras library.
%matplotlib inline
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers
from tensorflow.keras.preprocessing import image
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping, TensorBoard
import os, shutil, cv2, random
import pandas as pd
import numpy as np
import seaborn as sn
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from sklearn.metrics import confusion_matrix
import random as rd
import timeit
from datetime import datetime
tf.__version__
We specify the value of the hyper-parameters for our model.
image_size = 200
image_width, image_height = image_size, image_size
nb_epochs = 40
batch_size = 20
test_size = 30
input_shape = (image_width, image_height, 3)
train_size = 350
val_size = 50
Once the pictures dataset created from Google Images, we need to create the training, validation and test sets.
The train folder should contain $n$ folders (for the moment 2) each containing images of the respective classes.
sports_list = ['james', 'nadal'] # list of the athletes to be classified
base_dir = 'D:/Documents/sport'
# Directories for our training, validation and test splits
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
test_dir = os.path.join(base_dir, 'test')
os.mkdir(train_dir)
os.mkdir(validation_dir)
os.mkdir(test_dir)
# Creathe the directories with the training and validation pictures
for i in sports_list:
train_i_dir = os.path.join(train_dir, i)
validation_i_dir = os.path.join(validation_dir, i)
os.mkdir(train_i_dir)
os.mkdir(validation_i_dir)
# Creathe the directories with the test pictures
os.mkdir(os.path.join(test_dir, 'test'))
train_l = rd.sample(range(440), train_size)
not_train = [i for i in range(440) if i not in train_l]
val_l = rd.sample(not_train, val_size)
test_l = [i for i in not_train if i not in val_l]
# Copy first the first images to train_i_dir
print('Copying images to train folder')
for i in sports_list:
fnames = [str(i)+'({}).jpg'.format(j+1) for j in train_l]
for fname in fnames:
src = base_dir + '/' + str(i) + '/' + fname
dst = base_dir + '/train/' + str(i) + '/' + fname
shutil.copyfile(src, dst)
# Copy other images to vadidation
print('Copying images to validation folder')
for i in sports_list:
fnames = [str(i)+'({}).jpg'.format(j+1) for j in val_l]
for fname in fnames:
src = base_dir + '/' + str(i) + '/' + fname
dst = base_dir + '/validation/' + str(i) + '/' + fname
shutil.copyfile(src, dst)
# Copy other images to test
print('Copying images to test folder')
for i in sports_list:
fnames = [str(i)+'({}).jpg'.format(j+1) for j in test_l]
for fname in fnames:
src = base_dir + '/' + str(i) + '/' + fname
dst = base_dir + '/test/test/' + fname
shutil.copyfile(src, dst)
It is interesting to visualize few photos of each possible class. The following cells represent 9 pictures of the athletes.
def plot_pictures(athlete):
""" Plot 9 random pictures of the athetes """
path = os.path.join(base_dir,athlete)
fig=plt.figure(figsize=(13, 13))
for i,filename in enumerate(rd.sample(os.listdir(path), 9)):
img_array = mpimg.imread(os.path.join(path,filename)) # load image pixels
fig.add_subplot(3,3, i + 1)
plt.imshow(img_array)
plt.show()
plot_pictures('nadal')
plot_pictures('james')
The deep learning model developed below uses convets. A convolutional neural network (CNN) is a class of deep learning neural networks suited for analyzing visual imagery. I will not present you the theory about it because there are many good explanations available online, such as this article.
The convnet takes as input tensors of shape (image_height, image_width, image_channels) previously specified. The model is a stack of Conv2D and MaxPooling2D layers. Moreover, the 3D outputs are flattened to 1D, then fed into a densely-connected classifier network composed of a stack of Dense layers.
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=input_shape))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.2))
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
To better understand how our model is built, we display its architecture.
model.summary()
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
A neural network can only take as input figures and an image is nothing but a matrix of pixel values. So, the data should be formatted into appropriately pre-processed floating point tensors before being fed into our network. The pre-process is composed of several steps, such as decode the JPG content to RBG grids of pixels, convert these into floating point tensors or also rescale the pixel values ($[0, 255]$) to the $[0, 1]$ interval.
To do that, we use the class ImageDataGenerator from the Keras module keras.preprocessing.image which allows to quickly set up Python generators that can automatically turn image files on disk into batches of pre-processed tensors.
# All images will be rescaled by 1./255
train_data_generator = ImageDataGenerator(rescale=1./255)
validation_data_generator = ImageDataGenerator(rescale=1./255)
test_data_generator = ImageDataGenerator(rescale=1./255)
train_generator = train_data_generator.flow_from_directory(
train_dir, # target directory
target_size=(image_width, image_height), # resize the images
batch_size= batch_size,
class_mode='binary')
validation_generator = validation_data_generator.flow_from_directory(
validation_dir,
target_size=(image_width, image_height),
batch_size= batch_size,
class_mode='binary')
test_generator = test_data_generator.flow_from_directory(
test_dir,
target_size=(image_width, image_height),
batch_size=1,
class_mode=None,
shuffle=None)
callback = [EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=2)]
history = model.fit(
train_generator,
steps_per_epoch= len(train_generator.filenames)//batch_size,
epochs=20,
validation_data=validation_generator,
validation_steps=len(validation_generator.filenames)//batch_size,
callbacks=callback)
model.save('james_vs_nadal_v1.h5')
def plot_results_curves(res):
""" Plot the training and validation curves for the accuracy metric and the loss """
acc = res['acc']
val_acc = res['val_acc']
loss = res['loss']
val_loss = res['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'g', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'g', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
plot_results_curves(history.history)
def matrix_confusion(probabilities):
""" Plot the confusion matrix """
y_true = np.array([0] * 40 + [1] * 40)
y_pred = probabilities > 0.5
cm = confusion_matrix(y_true, y_pred)
ax = plt.subplot()
sn.heatmap(cm.T, annot=True, ax = ax); #annot=True to annotate cells
# labels, title and ticks
ax.set_xlabel('True labels')
ax.set_ylabel('Predicted labels')
ax.set_title('Confusion Matrix')
ax.xaxis.set_ticklabels(['James', 'Nadal'])
ax.yaxis.set_ticklabels(['James', 'Nadal'])
test_generator.reset()
proba = model.predict_generator(test_generator, 80)
matrix_confusion(proba)
To see if the model well generalize with unseen data, we use test pictures and visualize the results.
def visualize_test_results(probabilities, test_generator):
fig = plt.figure(figsize=(13, 13))
for index, probability in enumerate(probabilities):
image_path = test_dir + "/" +test_generator.filenames[index]
img = mpimg.imread(image_path)
ax = fig.add_subplot(1,3, index%3 + 1)
ax.imshow(img)
if probability >0.5:
ax.title.set_text("%.2f" % (probability[0]*100) + "% Nadal")
else:
ax.title.set_text("%.2f" % ((1-probability[0])*100) + "% James")
if index%3==2:
plt.show()
fig = plt.figure(figsize=(13, 13))
visualize_test_results(proba, test_generator)
We do not have a big pictures dataset for one specific reason. As it is created thanks to URL requests from Google Images, it needs daunting tasks of cleaning the wrong pictures: for examples, if some pictures contain Nadal and Federer, they have to be deleted. This is done handlly, then it is impossible to have some thousand pictures in our dataset.
However, we can use data augmentation: it is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images. By creating them via a number of random transformations that yield believable-looking images, it can improve the ability of the fit models to generalize.
The ImageDataGenerator instance configures a number of random transformations to be performed on the images.
datagen = ImageDataGenerator(
rotation_range=45,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.3,
zoom_range=0.1,
horizontal_flip=True,
fill_mode='nearest')
train_nadal_dir = os.path.join(train_dir, 'nadal')
fnames = [os.path.join(train_nadal_dir, fname) for fname in os.listdir(train_nadal_dir)]
img_path = fnames[rd.randint(0,440)] #choose one picture
img = image.load_img(img_path, target_size=(image_width, image_height))
x = image.img_to_array(img)
x = x.reshape((1,) + x.shape)
i = 0
for batch in datagen.flow(x, batch_size=1): # .flow() generates batches of randomly transformed images
plt.figure(i)
imgplot = plt.imshow(image.array_to_img(batch[0]))
i += 1
if i % 4 == 0:
break
plt.show()
model_augmented = models.Sequential()
model_augmented.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(image_width, image_height, 3)))
model_augmented.add(layers.MaxPooling2D((2, 2)))
model_augmented.add(layers.Conv2D(64, (3, 3), activation='relu'))
model_augmented.add(layers.MaxPooling2D((2, 2)))
model_augmented.add(layers.Conv2D(128, (3, 3), activation='relu'))
model_augmented.add(layers.MaxPooling2D((2, 2)))
model_augmented.add(layers.Conv2D(128, (3, 3), activation='relu'))
model_augmented.add(layers.MaxPooling2D((2, 2)))
model_augmented.add(layers.Conv2D(128, (3, 3), activation='relu'))
model_augmented.add(layers.MaxPooling2D((2, 2)))
model_augmented.add(layers.Flatten())
model_augmented.add(layers.Dropout(0.2))
model_augmented.add(layers.Dense(512, activation='relu'))
model_augmented.add(layers.Dense(1, activation='sigmoid'))
model_augmented.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
train_datagen_augmented = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)
# Note that the validation data should not be augmented!
train_generator_augmented = train_datagen_augmented.flow_from_directory(
train_dir, # target directory
target_size=(image_width, image_height), # resize the images
batch_size= batch_size,
class_mode='binary')
validation_generator_augmented = validation_data_generator.flow_from_directory(
validation_dir,
target_size=(image_width, image_height),
batch_size= batch_size,
class_mode='binary')
callback = [EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=3)]
history_augmented = model_augmented.fit(
train_generator_augmented,
steps_per_epoch= len(train_generator_augmented.filenames)//batch_size,
epochs=20,
validation_data=validation_generator,
validation_steps=len(validation_generator_augmented.filenames)//batch_size,
callbacks = callback)
model_augmented.save('james_vs_nadal_v2.h5')
plot_results_curves(history_augmented.history)
test_generator.reset()
probabilities = model_augmented.predict_generator(test_generator, 80)
matrix_confusion(probabilities)
visualize_test_results(probabilities, test_generator)
A common approach to train deep learning model with small image datasets is to use a pre-trained network, a saved network previously trained on a large dataset. And if this original dataset is general enough, the spatial feature hierarchy learned by the pre-trained network can act as a generic model and improve our model.
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=input_shape)
conv_base.summary()
We use the representations learned by the pre-trained network to extract interesting features from new samples. These features are then run through a new classifier, which is trained from scratch.
Now, we will run the convolutional base over our dataset, save the output to a Numpy array and then use this data as input to a densely-connected classifier.
(size1, size2, size3) = conv_base.get_layer(index = -1).output_shape[1:]
def extract_features(directory, sample_count, batch_size = 10):
features = np.zeros(shape=(sample_count, size1, size2, size3))
labels = np.zeros(shape=(sample_count))
generator = train_data_generator.flow_from_directory(
directory,
target_size=(image_width, image_height),
batch_size=batch_size,
class_mode='binary')
i = 0
for inputs_batch, labels_batch in generator:
features_batch = conv_base.predict(inputs_batch)
features[i * batch_size : (i + 1) * batch_size] = features_batch
labels[i * batch_size : (i + 1) * batch_size] = labels_batch
i += 1
if i * batch_size >= sample_count:
# Note that since generators yield data indefinitely in a loop,
# we must `break` after every image has been seen once.
break
return features, labels
train_features, train_labels = extract_features(train_dir, train_size*2)
validation_features, validation_labels = extract_features(validation_dir, val_size*2)
test_features, test_labels = extract_features(os.path.join(base_dir, 'test2'), 40*2, 1)
The extracted features goes to a densely-connected classifier, so we must flatten them.
train_features = np.reshape(train_features, ( 350 *2, size1*size2*size3))
validation_features = np.reshape(validation_features, (50 *2, size1*size2*size3))
test_features = np.reshape(test_features, (40 *2, size1*size2*size3))
model_pretrained = models.Sequential()
model_pretrained.add(layers.Dropout(0.3))
model_pretrained.add(layers.Dense(512, activation='relu', input_dim=size1*size2*size3))
model_pretrained.add(layers.Dense(1, activation='sigmoid'))
model_pretrained.compile(optimizer=optimizers.RMSprop(lr=1e-4),
loss='binary_crossentropy',
metrics=['acc'])
callback = [EarlyStopping(monitor='val_acc', mode='max', verbose=1, patience=3)]
history = model_pretrained.fit(train_features, train_labels,
steps_per_epoch= len(train_generator_augmented.filenames)//batch_size,
epochs=20,
validation_data=(validation_features, validation_labels),
callbacks = callback)
model_pretrained.save('james_vs_nadal_v3.h5')
plot_results_curves(history.history)
probabilities = model_pretrained.predict(test_features)
matrix_confusion(probabilities)
In this notebook, I presented you my way of building a picture classifier. Specifically, my work was divided in several parts:
By looking the different metrics, the best model are the orignial one and the one using data augmentation. We can notice with the confusion matrix that the accuracy for the test dataset are respectively 87.5% and 85%. I do not reach score around 0.95% as accuracy because of diverse reasons:
the quality of pictures. Usually, for the face recognition problem, we use identity photo with only the face such as in the following picture.
Here, we see that the color jersey of the athletes varies a lot, the light or also the face size in the picture. For example, if you look the below picture, we understand why it is too difficult for the algorithm to perfectly recognize the sportsmen.
The deep learning models are based on RGB pixels, thus it is too hard to not confuse Nadal during an interview and James who always plays inside with dark background.
It is clear that if we want to improve the model performance, we have to find much more data and more focused on their face. In my work, I extracted personally the image dataset from Goole Images and then clean it, by deleting the pictures with other teammates or wrong athletes. This data pre-processing is time-consuming and it will be interesting to find an automatic way to do it.
Now, I better understand why one said that data is the oil of machine learning problems.
Perhaps you have been wondering :
Back to top.